Shallow Parsing of Spoken Estonian Using Constraint Grammar

نویسندگان

  • Kaili Müürisep
  • Heli Uibo
چکیده

In this paper we describe how we have adapted the syntactic analyzer of written Estonian to the spoken language. The Constraint Grammar shallow syntactic parser (Müürisep et al. 2003) was used for the automatic syntactic analysis of the corpus of Estonian spoken language (Hennoste et al. 2000). To adapt the parser, the clause boundary detection rules as well as some syntactic constraints had to be changed. Two new syntactic tags were also introduced. In the paper the introduced changes are described and the achieved results are analyzed. The parser determined the syntactic label unambiguosly for 90% of the words in the text in average, using the manually morphologically disambiguated text as an input. The error rate was less than 3%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disfluency Detection and Parsing of Transcribed Speech of Estonian

The paper introduces our strategy for adapting a rule based parser of written language to transcribed speech. Special attention has been paid to disfluencies (repairs, repetitions and false starts). A Constraint Grammar based parser was used for shallow syntactic analysis of spoken Estonian. The modification of grammar and additional methods improved the recall from 97.5% to 97.7% and precision...

متن کامل

Parsing Manually Detected and Normalized Disfluencies in Spoken Estonian

An experiment with an Estonian Constraint Grammar based syntactic analyzer is conducted, analyzing transcribed speech. In this paper the problems encountered during parsing disfluencies are analyzed. In addition, the amount by which the manual normalization of disfluencies improved the results of recall and precision was compared to non-normalized utterances.

متن کامل

Parsing Estonian with Constraint Grammar

This paper describes the current state of syntactic analysis of Estonian using Constraint Grammar, focusing mainly on the determination of syntactic functions. Constraint Grammar of Estonian was written in 1996-2000 at the University of Tartu. The author has developed its syntactic part.

متن کامل

Syntactically annotated corpora of Estonian

Syntactically annotated corpora are needed 1) to train and test parsers and various language technological products grammar checkers, information retrievers and extractors, machine translators etc; 2) to check the agreement of existing linguistic theories with the real language usage. The corpora can be annotated on different levels of depth. In shallow syntactically annotated corpora a syntact...

متن کامل

Determination of Syntactic Functions in Estonian Constraint Grammar

This article describes the current state of syntactic analysis of Estonian using Constraint Grammar. Constraint Grammar framework divides parsing into two different modules: morphological disambiguation and determination of syntactic functions. This article focuses on the last module in detail. If the morphological disambiguator achieves the precision more than 85% and error rate is smaller tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005